You should submit a knitted pdf file on Moodle, but be sure to show all of your R code, in addition to your output, plots, and written responses.

Web scraping

  1. Read in the table of data found at the link here and create a scatterplot of land area versus the 2019 estimated population. Additionally, do necessary tidying: get rid of extraneous information in the cells, parse columns into the proper format, etc. A few things to look for:

Further hints:

Hello! I’m making a change!

2 + 2
## [1] 4

Now I’m making a change from GitHub!!!

  1. Following the examples from class, use the rvest package to pull off data from the link here with the top 50 grossing films from 2018. Generate a tibble that contains the title, gross, star rating (imdbscore), and metascore for the top 50 films. Then create a scatterplot of star rating versus Gross. A couple of hints:
  1. Identify which films of the top 50 from 2018 had the biggest discrepancy between reviewers (metascore) and viewers (star rating).

  2. 5 points if you push your Rmd file with HW15 solutions along with the knitted pdf file to your MSCS264-HW15 repository in your GitHub account. So that I can check, make your repository private (good practice when doing HW), but add me (username = lfbv) as a collaborator under Settings > Collaborators.

Map example

vaccine_data <- read_csv("Data/exam1data.csv") 
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   State = col_character(),
##   Date = col_date(format = ""),
##   people_vaccinated = col_double(),
##   total_distributed = col_double(),
##   share_doses_used = col_double(),
##   people_vaccinated_per100 = col_double(),
##   Governor = col_character(),
##   Region = col_character(),
##   month0 = col_double(),
##   day0 = col_double(),
##   year0 = col_double(),
##   est_population = col_double(),
##   dist_per_person = col_double(),
##   prev_day = col_double(),
##   daily_vaccinated = col_double()
## )
vacc_mar13 <- vaccine_data %>%
  filter(Date =="2021-03-13") %>%
  select(State, Date, people_vaccinated_per100, share_doses_used, Governor) %>%
  mutate(State = str_replace(State, " State", ""),
         State = str_to_lower(State))


library(viridis) # for color schemes
## Loading required package: viridisLite
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
map_data("state") %>%
  left_join(vacc_mar13, by =c("region" = "State")) %>%
  ggplot(mapping = aes(x = long, y = lat,
                          group = group)) + 
  geom_polygon(aes(fill = people_vaccinated_per100), color = "black") + 
  labs(fill = "People Vacc.\nper 100 pop.") +
  coord_map() + # This scales the longitude and latitude so that the shapes look correct.
  theme_void() + # This theme can give you a really clean look! 
  scale_fill_viridis() + # you can change the fill scale for different color schemes.
  labs(title = "Cumulative People Vaccinated per 100 population\nMarch 13, 2021")

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
(vaccine_data %>% 
  group_by(Region, Date) %>%
  summarize(people_vacc_total = sum(people_vaccinated_per100)) %>%
  ggplot(mapping = aes(x = Date, y = people_vacc_total, color = Region)) + 
  geom_point() +
    geom_line()+
    labs(title = "Cumulative People Vaccinated per 100 population", 
         y = "People/100 Population", 
         x = "Date")) %>%
  ggplotly()
## `summarise()` has grouped output by 'Region'. You can override using the `.groups` argument.
library(leaflet)
airbnb.df <- read_csv("Data/airbnbData_full.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   Title = col_character(),
##   baseurl = col_character(),
##   AboutListing = col_character(),
##   HostName = col_character(),
##   MemberDate = col_character(),
##   BookInstantly = col_character(),
##   Cancellation = col_character(),
##   P_Cleaning = col_character(),
##   P_Deposit = col_character(),
##   P_ExtraPeople = col_character(),
##   P_Monthly = col_character(),
##   P_Weekly = col_character(),
##   R_CI = col_character(),
##   R_acc = col_character(),
##   R_clean = col_character(),
##   R_comm = col_character(),
##   R_loc = col_character(),
##   R_val = col_character(),
##   RespRate = col_character(),
##   RespTime = col_character()
##   # ... with 7 more columns
## )
## ℹ Use `spec()` for the full column specifications.
## Warning: 4 parsing failures.
##  row           col expected    actual                       file
## 1961 S_Accomodates a double Not Found 'Data/airbnbData_full.csv'
## 1961 S_NumBeds     a double Not Found 'Data/airbnbData_full.csv'
## 1993 S_Accomodates a double Not Found 'Data/airbnbData_full.csv'
## 1993 S_NumBeds     a double Not Found 'Data/airbnbData_full.csv'
Encoding( x = airbnb.df$AboutListing ) <- "UTF-8"
airbnb.df$AboutListing <-
  iconv( x = airbnb.df$AboutListing
         , from = "UTF-8"
         , to = "UTF-8"
         , sub = "" )

# This part makes the map!
leaflet() %>%
    addTiles() %>% 
    setView(lng = mean(airbnb.df$Long), lat = mean(airbnb.df$Lat), 
            zoom = 13) %>% 
    addCircleMarkers(data = airbnb.df,
        lat = ~ Lat, 
        lng = ~ Long, 
        popup = ~ AboutListing, 
        radius = ~ S_Accomodates,  # These last options describe how the circles look
        weight = 2,
        color = "red", 
        fillColor = "yellow")